arXiv: 1508.02309v2 [q-bio.SC] 21 Aug 2015 


Estimation of the Diffusion Constant from Intermittent Trajectories 
with Variable Position Uncertainties 

Peter Relich, Mark Olah, Patrick Cutler, Keith Lidke 
August 24, 2015 


Abstract 

The movement of a particle described by Brownian motion is quantified by a single parameter, D, the diffusion 
constant. The estimation of D from a discrete sequence of noisy observations is a fundamental problem in biological 
single particle tracking experiments since it can report on the environment and/or the state of the particle itself via 
hydrodynamic radius. Here we present a method to estimate D that takes into account several effects that occur 
in practice, that are important for correct estimation of D, and that have hitherto not been combined together for 
estimation of D. These effects are motion blur from finite integration time of the camera, intermittent trajectories, 
and time-dependent localization uncertainty. Our estimation procedure, a maximum likelihood estimation, follows 
directly from the likelihood expression for a discretely observed Brownian trajectory that explicitly includes these 
effects. The manuscript begins with the formulation of the likelihood expression and then presents three methods to 
find the exact solution. Each method has its own advantages in either computational robustness, theoretical insight, or 
the estimation of hidden variables. We then compare our estimator to previously published estimators using a squared 
log loss function to demonstrate the benefit of including these effects. 
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1 Introduction 


Single Particle Tracking (SPT) is a method to observe and classify the motion of individual particles as trajectories: 
estimates of a particle’s position in a sequence of discrete measurement times. In the field of biological microscopy, 
SPT has been used for finding and analyzing protein motion in heterogeneous environments like the cellular mem¬ 
brane (1,2) and cytoplasm (3, 4). The SPT trajectory information can be used to resolve variations in the individual 
motion of molecules that would otherwise be lost in ensemble imaging techniques. 

In the analysis of trajectories, the pure Brownian motion model is often the first model used to describe a trajectory 
in the absence of prior information about the movement. The behavior of a single particle dominated by Brownian 
motion can be described by a normal distribution with the variance term proportional to a single physical scale param¬ 
eter D , the diffusion constant; which makes Brownian motion the simplest model for describing stochastic motion. 
More complicated behavior could potentially be modeled as Brownian motion with discrete changes in the diffusion 
constant that could be identified with change point analysis (5). Therefore, the estimation of the diffusion constant 
of a particle from discrete, noisy, and possibly short particle trajectories is a fundamental problem in single particle 
tracking. 

In this manuscript, we focus on the likelihood distribution of D. We present a maximum-likelihood-based approach 
for estimating the diffusion constant of a particle given an SPT trajectory that includes the individual localization error 
for each position in the trajectory, the time of the observation, and the camera integration time. Our approach is based 
on a direct solution to the likelihood equation for the observation of a particular trajectory. The need for such an 
estimation procedure has evolved out of the rapid progress that has been made in SPT analysis techniques over the last 
few years (6-10). In particular, some emitter localization techniques can not only accurately resolve the location of 
an emitter to tens of nanometers, but can also reliably estimate the localization error (11). Because the signal to noise 
ratio of a particle can vary significantly from frame to frame in an image sequence (e.g. from varying background, 
or photobleaching of the probe), the localization error reported for each observation in a trajectory can also vary 
significantly from frame to frame. We have therefore developed an estimator that takes into account this information. 

1.1 Background and Related Work 

Historically, one of the primary techniques for estimating the diffusion constant from trajectories relied on a linear 
regression of the mean-squared-displacement (MSD) of the tracked particle coordinates as a function of time lag (12). 
In the absence of measurement errors, the observed MSD for pure Brownian motion scales linearly with time lag 
and intersects at the origin, allowing the direct recovery of the diffusion constant from a linear regression on the well 
sampled data points. It has been shown that a regression of the MSD with an offset parameter can be interpreted to 
account for the cumulative effects of static (13) and dynamic measurement errors (14). If the MSD is built using the 
same data points for multiple time lags, the correlation between MSD values must also be taken into account in the 
regression (12, 15, 16). Although it seems theoretically possible to include individual localization error into the MSD 
regression, to date this has not been described. 

A separate line of work has focused on maximum likelihood approaches to the estimation procedure. A maximum 
likelihood estimator works by finding the maximum of a likelihood function C(D) = P(0 \ D) that gives the proba¬ 
bility of observing a particular trajectory O, given a diffusion constant 1). Ideally this probability should incorporate 
both the variable localization errors of the trajectory and effect of motion-blur. The motion-blur effect arises from the 
fact that each localization is performed on data that is acquired over some non-zero exposure time. Typically camera 
sensors integrate the signal over the exposure time resulting in a blurring of the particle image. This blurring has 
important numerical effects on the likelihood function (17). A specific solution to the likelihood function has been 
accurately derived that incorporates the effects of motion-blur but with the caveat that only a single global localization 
error estimate is used as an input or estimated (16, 18). This estimator is a more robust alternative to the MSD-based 
estimators because it can implement all trajectory information without incurring systematic error when the data is not 
well conditioned for a linear regression. Subsequent work has extended this approach to deal with non-uniformly 
spaced or intermittent trajectories (19), however the particular implementation in (19) did not account for motion blur. 
Maximum likelihood estimators are not the only class of diffusion estimators that have evolved recently; continued 
development on displacement-based estimators has resulted in an estimator that incorporates the effects of covariances 
between sequentially observed displacements (20). 
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In this work we provide a generalized solution to the likelihood function, incorporating variable localization er¬ 
rors and variable displacement periods, which results in an improvement in estimation accuracy for short trajectories, 
trajectories with large variations in localization accuracy, and trajectories with intermittently spaced measurements. 
In Sec. 2 we formulate the diffusion likelihood function to directly incorporate the effects of motion-blur, variable 
localization errors, and intermittent or non-uniformly spaced observations in time. We present three independent so¬ 
lutions to this likelihood function. The first derivation, the recursive method (Sec. 3), is a sequential integration of the 
nuisance parameters and provides the fastest numerical implementation. The second derivation, the Laplace method 
(Sec. 4), utilizes a second order Taylor expansion to express the likelihood as a multivariate Gaussian in the basis of 
integration. The Laplace method additionally returns the maximum likelihood values of the true positions given a 
sampled D. The third derivation, the Markov method (Sec. 5), calculates the characteristic function in order to express 
the likelihood in the basis of displacements. The Markov method allows us to verify that the generalized form of the 
expression derived in (18) is the same distribution as the expressions derived in this manuscript. The Markov method 
was also instrumental in determining the coefficients necessary to reduce the computational complexity of all the 
methods (Sec. 13.2). Each of these derivations leads to an independent, numerically accurate computational algorithm 
for estimating the likelihood of D (Sec. 6), making full use of all the information contained in a noisy trajectory. The 
resulting likelihood calculation allows for robust computations in specific problems, such as a maximum likelihood 
estimator, maximum a posteriori estimate, or change point analysis. We compare the results of our maximum likeli¬ 
hood estimator (MLE) to the current state of the art estimation software (16) with the squared log loss function and 
demonstrate that the additional information provided from the localization errors allows for better estimates of D with 
trajectories parameterized by any non-constant, but otherwise arbitrary distribution of localization variances. 


2 Theory 

If a diffusing particle is accurately and exactly observed at a discrete sequence of N + 1 positions X = {x, }^ 1 at 
times ti, then P(X | D), the probability of sequence X given diffusion constant D , is 

N 

P(X\D) = ]JP( x,:+i | Xj )■ (1) 

i—l 

In Eq. 1, P(x ; + 1 | X;) = P(x,; + i | Xi, D) is the probability density of each discrete jump from X; — > x.; + i over time 
step 5ti = ti -)_i — ti, given diffusion constant D. 

When measured experimentally, however, the true positions X are never known exactly, but are related to N 
observed positions O by some distribution P(o, | x,.x, +1 ), where the dependence on both x, and x i+1 arises from 
the effects of exposure time integration by the observation apparatus which will be dealt with in detail later. Under this 
experimental model, P( O, X | D), the combined likelihood of the observed positions O and the actual positions X 
is a product of the observation probability densities /To, | x,, x, + 1 ) and the diffusion transition probability densities 
P(xi + i | x,) for each of the N observed positions and displacements, 

N 

p(o, X |P)=n P(Oi | X-i, X 2 ;_|_i) P(x i+ 1 | X 2 ) . (2) 

i—l 

Since X is unknown for experimental data, we integrate Eq. 2 over all possible X to marginalize out the dependence 
on X, and write the diffusion likelihood as an integral over the space of all X-values, 


N 


P(0 | D) = / dXP(O.X | D) = /dXp[P(oi|x i ,x i+1 )P(x i+1 |x i ). 


i—l 


Experimental data typically involves trajectories with two or three spatial dimensions. For diffusion in an isotropic 
medium and particle uncertainties given as normal distributions with no covariance among the spatial dimensions, 
the probability distribution of a particular displacement in each dimension is separable. Thus, if T is the number of 
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dimensions, then 


P(Q\D)= l[P(O n \D). 


(3) 


n— 1 


Hence, it is sufficient to only consider the estimation problem in the one-dimensional (ID) case O = { Oi }^ =1 , and 


N 

P(0\D)= / dX Y[P(oi\xi,x i+ i) P(x i+ i\xi). 


(4) 


2.1 Accounting for the effects of exposure time integration 

Equation 40 is the fundamental description of the likelihood of diffusion constant D given observations (). Unfortu¬ 
nately, solving for this expression explicitly is difficult because every o, term is dependent on both x, and x»+i. This 
is because the estimate of o,’s position is typically made from data collected over an exposure time 0 < t e < t l+ t — t, . 
If the observational apparatus is a camera sensor, the signal will be integrated over the frame, resulting in a motion- 
blurred image of the moving particle, hence the observed location is conditional upon the particle’s true position at the 
beginning (#*) and end (x; + i) of the frame. 

In the case where exposure time t e goes to 0, but dt, = /: ?+1 — t, remains constant, the motion-blur effect is no 
longer present, so the observed location o; depends only on position x t . 


N 


N-l 


p(o i d)= dxn p( 0i i Xi ) n p{xj+ 11 


(5) 


i=i 


Without the additional dependence on Xj+i, the methods required to solve the integral in Eq. 5 are simpler. In order 
to use this simpler representation, we will transform Eq. 40 into a form which resembles Eq. 5, and seek functions 

M(oi, Xi) and T{xj + \,Xj) such that 


N 


N 


N-l 


P(0 | D) = [ dX T[ P(oi\xi,x i+ i) P(x i+1 \xi) = [ dX\\M{oi,Xi)\\T(xj + -L,: 

4R n+1 f =l • y R« , =1 j=1 


(6) 


The function T(x, + -i . x t ) stands for the transition probability; it is simply the probability of a particle diffusing with 
constant D moving from Xi to Xj+i, over time The function A4(oj,Xj) stands for the measurement probability 
and it encapsulates the net effect of both the measurement localization error and the motion-blur. The details of 
the representation equivalence of Eq. 6 are important for correctness, but they also unnecessarily complicate the 
exposition, and so can be found in Sec. 13. Other authors (14, 18) have investigated the motion-blur effects of exposure 
time integration, and found that the effect can be approximated by an effective decrease in variance of the measurement 
localization error, dependent on diffusion constant D and exposure time t e . Our derivations in Sec. 13 agrees with the 
effective correction factor in (14, 18), and more importantly provides a form for the diffusion likelihood that is directly 
amenable to the solution techniques we employ in Secs. 3,4, and 5. 

The result of the transformation of Eq. 6 is that the effective measurement function AT; and the transition function 
T; take the form of normalized Gaussians. We use the notation 


Af(a,a 0 ,ij) = 


1 


■J2TT1J 


exp 


(a - ap) 
2 V 


21 


to represent the normalized Gaussian function with variance r] = a 2 centered around mean a 0 considered as a function 
of a, ao, and //. Using this notation, we can succinctly represent the measurement and transition functions as. 


% = Ti{x i+ \,Xi) =Af(x i+1 ,Xi,uJi (£>)), for 1 < i < N — 1, and 

Mi = Mi(oi,Xi ) =A f(oi,Xi,£i(D)), for 1 < i < N. 


(7) 

( 8 ) 


4 



The transition functions (Eq. 46), are unaffected by the motion blur transformation and their Gaussian representation 
follows directly from the normally distributed displacements of diffusive processes, hence the variance is 


<jJi(D) = 2D5ti. 


For the measurement functions (Eq. 45), the variance £i(D), is the variance due to measurement error, v t , combined 
with a correction for the effect of motion-blur that is dependent on diffusion constant D and exposure time t e , 


£i{D) = Vi - Dt e / 3. 


where the factor of 1/3 comes from the continuous limit integration of photon emissions for averaged Brownian 
trajectories (Sec. 13.1). It is important to note that the independence of t e and 5ti allows for gaps in the trajectories, 
where 5ti could span a duration of multiple frames but t e is the exposure time of a single frame. 

The result is that Eq. 6 allows us to express the likelihood function exactly in a simple form that deals directly with 
variable localization error, motion-blur effects, and missing or irregularly spaced trajectory localizations, 

r N N-l 

P{0\D)= / dxT[Mi TT Tj. (9) 

r=i f=i 


3 Recursive Method 


The notation for the transition and measurement functions allows us to define the likelihood function C(D), by writing 
Eq. 9 in a form that emphasizes the dependencies on each marginalized position Xi, 


C{D) = P(0 | D) 


d xn A4]v 


<1xn-i Mn-iTn-i ■ ■ ■ 


dx 2 M 2 T 2 


dxi MiTi- 


( 10 ) 


The form of Eq. 10 leads to a direct recursive solution, taking into account the properties of integrals over products 
of normalized Gaussian functions. Define Ci as the sub-integrand of C(D) considering only the first i observations. 


Pi(D,X2) = J M\T\dxi, 

£i(D,x i+1 ) = JMiTiCi-idxi , 2 < i < N - 1 

C(D) = Cn(D) = JMN^N-idxN ■ 


Now, consider that the integral of a product of two normalized Gaussians sharing a parameter x, with means and 
variances denoted by c, and ip, respectively, is itself a normalized Gaussian (Sec. 11), 



2 

J\f(x,Ci,<pi) 

i =1 


Af{c 1 ,c 2 ,ipi +<^ 2 ). 


Hence, C\ is a normalized Gaussian in parameter x 2 . 


( 11 ) 


Ci(D,x 2 ) 


J dxiMiTi 


Af(o 1 ,x 2 ,£i + wi). 


This implies that C 2 {D,Xz) which is an integral over positions x 2 , can now be written as an integral over three 
normalized Gaussians, all of which share parameter x 2 , 


£ 2 {d, X 3 ) 


= / dx 2 A^272 


da’i M\T\ 


= / dx 2 A^ 2 7 2 £i. 


( 12 ) 
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Similarly, the integral of a product of three normalized Gaussians sharing the integrated parameter is itself a product 
of two normalized Gaussians (Sec. 11), 

f 3 

/ da; n N(ci,x,<pi) = Af(ci,c 2 ,<pi + <P 2 )A/'(c 3 , c', 7), (13) 

i =1 

where, 

, Ci ^ 2 + C2V1 , + ^ 1^3 + <M 2<^3 

c = -, and 7 = -. 

<P 1 + C2 <£1 + ^2 

Hence, applying Eq. 13 to Eq. 12, we find that 

C 2 = J dx 2 M 2 T 2 £i = ^(01,02, (£1 + wi) +£ 2 )A/'(a; 3 ,^ 2 ,? 7 2 ) 

is a product of two normalized Gaussians, one of which depends on X 3 and the other is a constant with respect to X. 
The variables /t 2 and // 2 follow from Eq. 11 and Eq. 13, and are the second pair of values in a recursive solution. Since 
all subsequent integrals, barring the last integral, can be expressed as a product of three normalized Gaussians, we can 
express the recursion variables as 


Mi = 01 , j?i=£i+cui, and ai = 771 + e 2 , 


(14) 


and for 2 < i < N — 1, 


M? —lG 4“ Vi—lOi , 

Mi = -, Vi = -h u>i, and = rji + G+i- 

Oii -1 at -1 

Finally, this allows us to express our integrands L % as 

Ci = JT(x 2 ,v i,r?t) 

i—1 

Ci J\f[Xi-i-i , Vi 1 Vi) 11 Oik), 2 t A 1 

JV-1 


k =1 


C(D) £jv — 11 A / - Vkj ctfc)- 


(15) 


(16) 


fc=i 


Equation 16 is the final form of the recursive solution for C(D) which is simply the product of N 1 normalized 
Gaussians each of which has parameters which come from a recursive relationship on o t , £,, and iu, . The value of D 
that maximizes C{I)) is the maximum likelihood estimate. 


4 Laplace Method 

An independent solution for Eq. 9 can be obtained using the Laplace method which is based on integrating the second 
moment of the Taylor expansion of the exponential component of a function. Given that the second moment of a 
Taylor expansion is quadratic, this ensures that the function under the integral is always a Gaussian function (21). 
Another caveat to the Laplace method is that the Taylor expansion has to occur about the peak of the exponential, so 
that the first moment of the Taylor expansion goes to 0. To perform the Laplace method, we express our likelihood 
C(D) = P(0 | D) in terms of exponential and non-exponential components 

C(D) = JdXf(X) = J dX h(X) exp [-g(X)}, 
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where f(x) is simply the integrand of Eq. 9, 


N N-l 

f(X ) = h(X)e X p[-g(X)} =l[M i l[T j . 

i= 1 3 =1 

Thus, using equations 45 and 46, we see that h = /i(X) is independent of X and g(X) is quadratic in X, 


(17) 


N 


1 JV_1 1 

n 


H V / 27rei Of V27TW,; ’ 


S(*)=E 


(Oj ~ X jf 

2 


E 


Q ; ; + i - a:j) 2 
2cu,- 


The maximum likelihood estimate X of the actual positions X, given D and O will be wherever the integrand is 
maximized, and since g(X) > 0, 

X = argmax/(X) = argming(X). 






Now, given that g(X) is quadratic, a second order Taylor expansion of g about X is exact and the Laplace method 
will provide an exact solution for C{D) as the integral can be shown to take the form of a standard Gaussian integral. 
To see this, we write out the second order Taylor expansion 


dX /(X) = h dX exp 


-g(X) - Vg(X){X - X) - * (X - XyXXg(X)(X - X) 


(18) 


Since X is the minima of g(X), the gradient Vg(X) = 0, and we can rearrange Eq. 18 to extract all terms independent 
of X, 

— x(X — X) T VV<?(X)(X — X) . (19) 


dX/(X)=/(X) /dXexp 


Furthermore, since h is independent of X, we know that — VV In /(X) = VVc/(X) = M, where M can be thought 
of as the inverse of the covariance matrix for the multivariate Gaussian, or equivalently as the Hessian matrix of 
— In /(X). Substituting M for VVg(X) in Eq. 19 we are left with a Gaussian integral with the solution. 


C{D) = /(X) / dX exp 


-~ 2 (x-xyM(x-x) 


= f(X) 


(2?r) 


N 


det M 

The Hessian matrix M is independent of X and is symmetric tri-diagonal with non-zero elements, 

1 


Mu = - 


M iti = - 


M NyN = — 


Mi,i +1 ^ — — 


d 2 In/ 

1 

dx\ 

£l 

d 2 In/ 

1 

dx 2 

Si 

d 2 In/ 

1 

dx 2 N 

£N 

d 2 In/ 


dxidx i+ 1 



1 


2 < i < N- 1 


Wj 1 

1 


£JV WAT-1 

1: 

UJj 


( 20 ) 


( 21 ) 


We also require X to compute Eq. 20, which can be solved for with the relation —V In /(X) = 0 (Sec. 14), giving 


X = M _1 0, 


( 22 ) 


where © = {#* = 0 ,/cJ;^-,. 
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5 Markov Method 


In this section we present a derivation of the likelihood function £(I)) utilizing a technique developed by Andrey 
Markov (22) and generalized by Chandrasekhar (23). Markov’s method allows us to transform P(0 \ D) (Eq. 9) from 
a function of the N observed positions O = {o,}^ into a function of the N — 1 discrete steps (displacements) 
between subsequent observations 

S , = {s i = o i+1 -o i }^ 1 . 

This is possible because the spatial invariance of diffusion means £(D) depends not on the absolute spatial positions 
O, but only their relative displacements, S. Thus we should expect that P(0 \ D) can also be expressed as P(S \ D ), 
however Eq. 10, which defines P{0 \ D), cannot be directly transformed into a function on S. This is where Markov’s 
method allows us to solve for a function P(S \ D) = P(0 \ D) for a given D value. For a particular fixed S and OS' 
of interest, the value of P(S \ D) dS 1 gives the probability that variable S' is within the bounds 

S- -dS < S' < S+ 1 dS. (23) 

2 ~ ~ 2 

More formally, P(S \ D) d,S' is the integral over the volume d,S' around the point of interest S, and we let S' represent 
the variable integrated over, 

rS+^dS 

P(S\D)dS = / ‘ dS'P(0\D). 

Js-^ds 

The issue remains that P{0 \ D) is expressed in a basis of O rather than of S, and integrating with respect to bounds 
in a different basis is non-trivial. In order to circumvent this issue, Markov utilized a product of Dirichlet integrals, 
S(S' / ) = nf^i 1 Cfc( s fc), to expand the limits of integration to all space 


P{S \D)dS = j dS" E(S’)P(0 | D ). (24) 

The idea is that for each dimension of 5, the Dirichlet integral £k{ s ' k ) acts like a continuous indicator function deter¬ 
mining if s' k is within the bounds of Eq. 23, 


6c(4) 



sin(|d s k p k ) 
Pk 


exp [ip k {s' k - 


s fc)] ) 


so that. 


4(4) 


1 s*; - \ds k < s' k < s k + jdsfc 

0 otherwise 


Therefore S(S') is the indicator function acting over the whole space of S', and determining if S' is within the volume 
d,S’ around our point of interest S, 


N -1 


w)= n 4(4)= 


k=l 


N— 1 

/ t = J 1 A 4 e [Sfc - 5 ds fc . s k + ] 

fc=l 

0 otherwise 


This puts Eq. 24 in the form 


P(S \D)dS = / dS" 


rN- 1 


dp 


j-j- 1 sin(kd s k p k ) 


fe=i 


Pk 


exp[ip T (S' ~ S)}P{0\D) 


(25) 


where p = {p k } k= i is a vector of conjugate coordinates to S'. We can then rearrange Eq. 25 to move the integral 
over dS" and all factors dependent on S' into function A (p), 


P(S\D)dS 


t n-i 


dp 


j-j- 1 sin(ids fc p k ) 
,k=i 


Pk 


exp [— tp J S]A(p). 


(26) 



Now, we can interpret A(p) as the characteristic function of P(0 \ D) in the S' basis and it has the form 

A(p) = J dS" exp [ip 1 S']P(0 | D ). (27) 

The form of Eq. 27 implies that A (p) is the inverse Fourier transform of P(0 \ D). Due to the properties of the Fourier 
transform, we assert A(p) is a bounded function with a finite integral because P(0\ D) is expressed as a finite product 
of Gaussians with non-zero variance, hence it is a continuous function (24). Given that j dpA(p) is bounded and dS 1 
is small we can approximate the product of sine functions in Eq. 26 as 

-j—|- sin(|dsfc pk) -i—r dsfc d S 

H J k = H ~T~ = 2^1 

k= 1 r fc=1 

Thus Eq. 26 becomes the Fourier transform of A(p) 

P(S \D)dS = ^_ 1 J dp exp [-*pTS] A (p). (28) 

We are now interested in evaluating A(p) explicitly. To do so we note that J d S' P(0 \ D) = 1 since P(0 \ D) is 
a probability distribution that, as we have argued, can be equivalently expressed in the S basis as P(S \ D) (Sec. 14). 
With this understanding we evaluate A(p), by expanding the exponential under integration in Eq. 27 as a Taylor series 
about the origin 


exp [ip 1 S"] = 1 +1 



N -1 

3 =1 


.2 J 2 


N -2 N— 1 

£ £ 2 pjPks'js' k 

j —1 k—j + l 


o(p 3 


Solving the integral for A (p) with the Taylor expanded exponential term given that we know P{0 \ D) is a nor¬ 
malized probability density under S' allows us to write A(p) in terms of expected values for Sk (using (•) to represent 
expectation). 


AT-l 


A(p) = 1 + ? £ pj (s'j) - 


JV-1 


EdG 2 > 


N—2 N— 1 

£ £ 2 p 1 Pk{s’ j s’ k ) 


0(P 3 ). (29) 




3 = 1 


3 — 1 k—j+l 


We take the approach of solving for the expectation values on S' to see if those results will simplify the expression 
in Eq. 29. If we set one of the observations, Oj, as a constant, we find that we can marginalize all of P(0 \ D) except 
for the terms that are independently represented by the basis we are interested in. In other words, we find that 


{s'i) — J ds 7 j s' i Af(s' i , 0, £i + £i+\ + u>i) — 0 
< S ' 2 > = J dfii s'i 2 J\f(s'i, 0, £i + £i+i + CJi) = £i + £?:+1 + +>i- 
(sWi+i) = J ds i ds i+i s'is' i+1 Af(S b ,0,T, b ) = -£i +1 . 

Where the substitution of variables required for integrating the expectation of (s's' +1 ) induces a bivariate Gaussian 
function with location parameters Sb = [s*, Si+i] T and covariance matrix 




<+i + G + G+l —£i+l 


Furthermore, we find that the separability of two non-adjacent displacements results in the relation (s's' +fc ) = 
(s') (s' i+k ) = 0 for k > 1. Since P(0 \ D) is a multivariate Gaussian with 0 mean, we can apply Isserlis’s theorem to 
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solve for all of the moments of the distribution (25) given knowledge of the second moments of the Fourier transform 
on P(0 | D), which can be expressed as a covariance matrix. This allows us to express Eq. 29 as 


A(p)= exp 



The covariance matrix S is symmetric tri-diagonal, with non-zero elements 


(30) 


— U>i + £j + £j+i 
+1 — £?+!• 


(31) 


The expression in Eq. 30 is well known as the characteristic function of a multivariate Gaussian. 
Eq. 30 into Eq. 28 and factoring out d S gives 


Substituting 


C{D) = P{S\D) 


\/(27r) 


N-l 


det £ 


exp 


— S t Tj~ 1 S 
2 


(32) 


6 Implementation 

We have presented three independent solutions to the experimental diffusion likelihood £(D) (Eq. 10): the recursive 
method (Sec. 3), the Laplace method (Sec. 4), and the Markov method (Sec. 5). While each method requires separate 
consideration, several features are common to all of the implementations. The separability of the problem allows 
us to estimate diffusion constants for any dimensional inputs inputs using the ID algorithms (Eq. 3). The inputs 
to the algorithms are: (1) the observed particle locations, O = (2) the observation times T = 

(3) the measurement variance for each observation V = {v*}^; (4) the exposure of each frame t e \ and (5) one 
or more diffusion constants D at which to evaluate the likelihood. The output for each D value is ln(£( i))). The 
logarithm of the likelihood makes the computation of products and exponentials much faster, and avoids the problem 
of numerical underflow for very small values of £(D). Additionally, because the logarithm is a strictly monotonically 
increasing function, argmax^, £(D) = argmax^, ln(£(Z?)), so the maximum likelihood estimate is identical for the 
log-likelihood. 


6.1 Recursive Method 


The recursive algorithm follows directly from the recursively defined variables (Eqs. 14 and 15), and the expression of 
£{D) as a product of Gaussians (Eq. 16). The recursive expressions for on, ij ,, and /(,, are causal (the *-terms depend 
only on the (i — 1)-terms), enabling their computation in a simple for loop over N. Noting that the logarithm of a 
normalized Gaussian is 


In Af(a, b , v) 


1 

2 


ln(27r) + ln(v) + 


(a — b ) 2 1 


(33) 


we apply Eq. 33 directly to Eq. 16 to arrive at a computationally efficient form for the recursive solution of the log- 
likelihood 


* 1 
In £(D) = ^2 In = -- 

i—1 

Of all the methods, the recursive method is the simplest to implement and the most computationally efficient and 
numerically stable. 

6.2 Laplace Method 

The computational core of the Laplace method centers around the Hessian matrix M (Eq. 21). This matrix is symmet¬ 
ric tri-diagonal, which means all non-zero elements are on the main diagonal and the diagonals immediately above and 


(N- l)ln(27r) 


N-l 

E 

i=1 


N-l 


ln(a») + E 


(Oi+i - /r,:) 5 


i =1 
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below. Using M we can solve the linear system X = Af _1 0 (Eq. 22) to obtain the maximum likelihood estimates X 
for the true particle locations. Typically, solving large linear systems is expensive but since M is tri-diagonal there are 
algorithms to solve this system in linear time (26). We refer the reader to Sec. 15 for the details of tri-diagonal matrix 
algorithms and our implementation. 

Given a solution for X, we can use the definition of f(X) in Eq. 17 along with Eq. 33 to compute 

N , ^ SO N ~ 1 

ln( 27 T£,) + —----f ln( 27 TWi) + 

i=l i=l 

Finally we can compute the log-likelihood using the Laplace solution of Eq. 20, finding that 

~ N 1 

In C(D) = In f{X) + — ln(27r) - - In det M 

1 N —1 N —1 ^ >. 2 N N / ^ \ 2 

= — - (N — 1) ln(27r) + ^ ln(cUj) + ^ ' li+1 -—-b ^ ln(£j) + ^ ———-b In det M 

L i=l i=l i=l i=l J 

(35) 


In /(*) = -- 


' N 

E 

.i=l 


y^ 1 (a?i + i - at;) 2 

i=l 


LJi 


(34) 


6.3 Markov Method 

Finally, the Markov method computation, like the Laplace method, is centered around matrix computations. In this 
case, the matrix of interest is the N — 1 dimensional covariance matrix E (Eq. 31), which also happens to be symmetric 
tri-diagonal, so the same linear-time algorithms used in the Laplace method are applicable (Sec. 15). 

For the Markov method computation we first solve the linear system <I> = E _1 S I , then apply this solution along 
with the tri-diagonal log-determinant algorithm to compute the logarithm of the likelihood expression from Eq. 32, 
giving 


ln£(D) = -1 [(N — 1) ln(27r) -b 5' T< 3> -b In det E]. 


7 Results 

To demonstrate the benefits of including individual localization errors in the estimation, we opt for a loss function to 
evaluate the quality of the maximum likelihood estimators (MLE) under consideration. The mean squared error is a 
popular choice for evaluating estimators, but it is not a sufficient loss function for the estimation of D, where the mean 
squared error shows over penalization for higher estimations as it has a bounded loss at D = 0 and an unbounded loss 
at D = oo. Instead, we will use the squared log loss function (27) 

£(D,D) = (ln(D) -ln(.D)) 2 . 

The squared log loss function is similar to the mean squared error, except that the squared distance is between the 
logarithms of D and D. From (28), we see that the variance term, D, scales with the observed data as a logarithm, 
hence the choice of the squared log loss function represents a metric between the expected data given by D and 1). 

For one set of SPT simulations, the true trajectory coordinates were first generated with the pure diffusion model. 
Full frame motion blur was accounted for and localization errors were independently and identically drawn from 
either a Uniform or a Gamma distribution to test the effects of variable localization errors without attributing success 
to a particular choice of distribution. As a control, a set of simulations with a constant localization error for all 
observations was generated to show that the likelihood distribution represented by the class of diffusion estimators that 
only recognize a Scalar Localization Error (SLE) was exactly the same as the distribution represented by either of our 
three derived methods which recognize a Vector of Localization Errors (VLE). We then used the Gamma and Uniform 
distribution based localization errors to compare the new VLE based estimators that account for individually measured 
localization errors over the SLE based estimators, for which we had input the square root of the mean localization 
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variance as the best representation for the scalar error in all trajectory observations. We performed 10,000 trajectory 
trials for each data point to estimate the risk of using a particular estimator, where risk is defined as R(D, D) = 
(@(D, D)^. The squared log risk is the variance of ln(£>) for an unbiased estimator. 

The simulations with the gamma distributed localization errors were generated according to 

Vv = a ~ Gamma(4, (VV)/4), 

where the standard gamma distribution p.d.f. is Gamma(fc,0) = 9~ k y/V exp(—-\/U /0)/T(k), and (V^V^) rep¬ 
resents the mean error. The simulations with the uniform distributed localization errors were generated according 
to 

W = a ~ Uniform(^ {'/V ), ^{W)), 

where the uniform distribution p.d.f. is Uniform(a, b) = 1/(6 — a) for \/V £ [a, b} and Uniform(a, b) = 0 for all 
other values of \fV. 
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(A) 


(B) 


D ~ 0.100(^m 2 /s), (W) = 0.020(/jm), ft = 0.010(a), IV,,ij, = 10000 


D = 1.000((im7s), N = 50.000, ft = 1.000(s), = 10000 




(C) 


(D) 


D — 0.100(//m 2 /s), {VV) — 0.035(/im), ft = 0.010(s), iV ttW , = 10000 


D = 1.000(//m 2 /s), N = 50.000, ft = 1.000(s), N,^ = 10000 




Figure 1: Comparison of estimation risk of the Vector of Localization Errors (VLE) and Scalar Localization Error 
(SLE) MLEs with localization errors drawn from Gamma or Uniform distributions. The MLE was found with the 
fminbnd command from Matlab on the calculated likelihood distributions with a lower bound of 10 -8 . Log-log 
plots (A) and (C) show the risk of using a particular estimator on trajectories of various lengths with other constant 
parameters set to typical experimental values with standard errors drawn from gamma (A) or uniform (C) distributions. 
Log-log plots (B) and (D) show the risk of using a particular estimator on trajectories of length 50 with various standard 
errors drawn from gamma (B) or uniform distributions and all other parameters were set to 1 to study the effects of 
relative localization error. In both (A) and (C) a fiducial line shows that after 30 observations the risk for the methods 
decreases approximately ex 1/N with trajectory length; in this regime the SLE estimators perform worse by a constant 
factor (to simulation precision) relative to the VLE estimators. In (B) and (D) the risk for SLE estimators increase at a 
faster rate than the VLE estimators, indicating that the localization errors provide valuable information for improving 
estimator reliability. 

Figures 1A and C are shown in log space to show that in the presence of sufficient information the risk of the VLE 
estimators is less than the risk of the SLE estimators by a constant proportionality factor when a scalar localization 
error is used to parameterize a continuous distribution of localization errors. In other words, for a given parameterized 
scalar localization error, the amount of observations required to generate the same quality D estimate is always less 
by a proportional factor for the VLE estimators method compared to the SLE estimators. Figures IB and D are shown 
in log space to show how each estimator begins to fail in the presence of increasing relative localization error given 
a fixed set of trajectory observations (50). For these subplots, the VLE estimators show a noticeable improvement in 
estimator reliability when the relative error is equal to or greater than the true underlying D. 
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In an experimental trajectory, the distribution that parameterizes the localization error is typically a function of 
several environmental variables so that it can often appear arbitrary or specific to a particular experimental trial. 
In this manuscript, we focus on two simple distributions to provide a fair metric for validation over thousands of 
simulated trajectories. It is worthwhile to further investigate the the precision increase of the VLE estimators over 
their SLE counterparts with localization error distributions of varying error variances. We do so by performing trials 
on trajectories with localization errors parameterized by the Gamma and Uniform distribution, but this time we vary 
the parameters that characterize the variance of these distributions without altering the value of the mean localization 
variance. To do so, we run simulations where the shape parameter, k, of the gamma distribution is altered so that our 
expression looks like 

y/V = a ~ Gamma(/c, (W)/k), 

and the bounds of the uniform distribution are altered so the expression becomes 

y/V = a ~ Uniform([l — b\(W), [1 + b\(W)). 

(A) (B) 




Trajectory Length N Trajectory Length N 


Figure 2: Risk ratio of the Scalar Localization Error (SLE) and Vector of Localization Errors (VLE) MLEs for various 
trajectories parameterized by localization errors drawn from Gamma and Uniform distributions of a single varying 
parameter but the same mean (y/V ). Log-log plot (A) shows the risk ratio of SLE and VLE with 5 Gamma distributions 
of the same mean localization error and different shape parameters, k. The increased value of k reduces the variance 
of the gamma distribution; k = 1 is an exponential distribution and k = 13 is approaching a gaussian distribution. 
All of the gamma distributions in (A) have a variance of ( y/V) 2 /k . Log-log plot (B) shows the risk ratio of SLE 
and VLE with 5 Uniform distributions of the same localization variance and different sampling boundaries, b. The 
reduced value of b reduces the variance of the uniform distribution; a given b in plot (B) corresponds to a variance of 

((W)b) /3, hence b = 0.1 is nearly a variance of 0. As long as the variance of (y/V) is greater than 0, the VLE 
estimators outperform the SLE estimators. 


We see in Fig. 2 that the effect of increasing the variance relative to the mean of the localization errors in these test 
distributions results in a growing disparity between the two classes of estimators. From Fig. 1 the effect of increasing 
{y/V) sets a minimum trajectory length where estimates become reasonable; e.g the linear decrease in risk for the 
gamma distribution of Fig. 1A is seen at shorter trajectories than the uniform distribution of Fig. 1C even though both 
the gamma and uniform distributions have the same variance because the uniform distribution has a larger {y/V). In 
Fig. 2 the constant precision improvement of the VLE estimators are recognized in the regime where both estimators 
start reporting risk values that scale linearly with trajectory length. Prior to that, the VLE estimators start performing 
significantly better at shorter trajectories, which is represented by the peaks seen in Fig. 2. These simulations, while 
overly simplified versions of real SPT trajectories, highlight the importance of characterizing each localization error 
accordingly to their associated localizations in a trajectory. 
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8 Discussion and Conclusion 


Starting from the fundamental diffusion likelihood expression (Eq. 9), we presented three independent solutions, each 
of which has different benefits, and leads to a computational algorithm with different advantages. The recursive method 
presents the simplest solution that is numerically more stable than the other methods for estimating likelihoods when 
D approaches 0. The Laplace method has the advantage that the expected true positions X are computed along with 
the likelihood, which may be useful in some applications. The Markov method was crucial for deriving the terms 
e-i in the components AT; given knowledge of the true underlying probability distribution, hence the generality of 
the Markov method is its main advantage. In terms of numerical accuracy and computational efficiency the Markov 
method is better than the Laplace method especially for very small D values, but it remains computationally inferior 
to the recursive method. Lor practical implementations, we recommend the recursive method unless the MLE of the 
true positions is also desired. 

The method described here naturally allows for trajectories with missing or irregularly spaced localizations by 
decoupling the concept of observation times t, from the exposure time t e . This is important when some localizations 
are missed because the gap between f; and / ?+1 becomes larger, but the exposure time remains the same. Thus to 
correctly account for the motion-blur, the effective weighting of the dependence of observed position O; on x, and x j+ -[ 
changes, and our technique directly incorporates this effect. Trajectory intermittency has been accounted for in prior 
studies (19) and in those same studies extensions to dynamic errors were suggested, but a convenient computational 
framework for an estimator that seamlessly factors in both trajectory intermittencies and dynamic error had not been 
explicitly worked out until now. 

Numerical implementations of the likelihood forms resulting from the three derivations were tested to justify 
the equivalence among the likelihood forms. Since all three derivations began from the same set of first principles, 
the three likelihood calculations are essentially equivalent. We note however that our implementation remains unit 
agnostic, so that the trajectory with a very small D value can be easily scaled up to appropriate units so that the value 
of D is closer to 1, where the numerical calculations will be more robust. 

Variable localization uncertainties could occur in practice from variable background intensities or photobleaching 
of the fluorescent label. We compared the performance of our VLE estimator to the current state-of-the-art SLE 
estimator using the squared log loss function and found a clear performance benefit when trajectories had variable 
localization uncertainties. 
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11 Implemented Notation and Gaussian Functions 


The derivations presented in the main text make heavy use of normalized Gaussian functions, which we denote as a 
function of three arguments. 


A f(a, b, v) = 


1 


7 TV 


exp 


(a-b ) 

2 v 


21 


The Gaussian function defined this way is symmetric with respect to the first two position parameters, so that Af(a, b, v ) 
Af(b, a, v), and the variance of the function is given by the third parameter. Also, the normalization factor ensures that 
the Gaussian integrated over all space with respect to either of its position parameters is unity. 


/ OO /»oo 

do b, v) = / d& J\f(a , 6 , v) 

-oo J — OO 


= 1. 


As a corollary, a normalized Gaussian has a useful scaling identity, for q > 0, 

Af(a, 6 , v ) = qj\f(qa, qb, q 2 v). 

Next, consider the case of the product of two normalized Gaussians sharing a common position parameter, which 
can be rewritten as a product of two normalized Gaussians where the common parameter only appears in one of the 
two, 

J\f(x, Af(x, /i 2 , r? 2 ) = J\f(ni, /x 2 , Vi + r/ 2 ) N(x, n', rj'), (36) 


where. 


, mm + H2m . / 

/x = - and, rj = 


Vim 


Vi + m Vi + Vi 

Using Eq. 36, the integral of the product of two normalized Gaussians over a shared position parameter is itself a 
normalized Gaussian in the other two (unintegrated) position parameters, 

J da; N{x,ni,Vi)N(x,m,Vi) = + Vi) J dx = AT(m,m,Vi+Vi)- (37) 


With two successive applications of Eq. 36, the product of three Gaussians which share a common position parameter 
becomes. 


A f{x, m,vi) N(x, H2,m) N{x, n 3 , m) 


where, qi" 
and, rj" 


Nim, m,vi + vi) N{x, v')N{x, r? 3 ), 

Af(m,m,Vi + Vi) AC(/x 3 , /x', 77' + 773) A/ - (x, v" , v")i 
v'v3 + mv' 

V' + V3 
V'V3 
V' + V3 ' 


(38) 


Finally, Eq. 38 allows the integral of three normalized Gaussians over a shared position parameter to reduce to 


da; Af(x, vi,Vi)-^(x, /x 2 , 772 ) N{x, /x 3 , 773 ) 


where, 7 


AA(/Xi, /x 2 , 771 + 772 ) A/’(/x 3 , /x', 77 ' + 773 ) ./ da; Af(x,v",v") 

AA(/xi, /x 2 , ? 7 i + 772 ) A/’(/x 3 , n', v' + ? 7 3 ) 

AA(/xi, /x 2 , ? 7 i + 772 ) A/’(/x 3 , /x', 7), 

V 1 V 2 + 771773 + 772773 
Vi + Vi 


(39) 
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12 Problem Formulation 


In the main manuscript, the likelihood distribution for a ID random walk given a set of observations, O = {o, ,, is 

described as an integral that marginalizes over the N + 1 unknown true positions X = 

N 

P(0 \D)= dX TTP(o i |*i,* j+ i)P(a: i+1 |a: i ). (40) 

XRN + 1 ^ 

We seek to rigorously define our formalism to justify the analysis used in the rest of the manuscript. 

12.1 The Probability of a Single Observation and the True Start Positions 

Under typical biological SPT experiments, the temporal resolution of a probe is high enough that the effects of diffrac¬ 
tion are much greater than the effects of particle motion in generating the point spread function of the image. Following 
the assumption that the variance due to diffraction is significantly greater than the variance due to motion, the point 
spread function can be defined as a stationary Gaussian function centered about the average position of the particle dur¬ 
ing a single frame with some background offset. With these considerations, we now set to define P(cy , x, , Xj+i | D): 
the probability of an observation and the particle’s true start and end points in a frame given the free diffusion model. 
Since the maximum likelihood estimator for a Gaussian function returns the peak (the maximum likelihood) and vari¬ 
ance (the error) of a gaussian distribution, there is sufficient information from the estimator to build a probability 
distribution relating the localization to the true averaged position of a particle. The probability of obtaining a localized 
position, Oi, given the true averaged position of the particle, y ,, and the estimator variance, v r , is 

P(oi\Vi) = Af(oi,yi,Vi). (41) 

We incorporate more information on y, by relating it to the start positions at the considered frame, i, and the subsequent 
frame, i + 1; where it is assumed that a frame begins immediately after the prior frame ends. Therefore the probability 
distribution of cy with frame start coordinates, x,; and Xj+i, is expressed as 

P(oi,Xi,Xi +1 \D) = J d yi P(oi\yi) P(yi\xi,x i+1 ) P(x i+1 \xi) P(xi), 

where the diffusion constant, D, is implicitly included in the probability distributions. The probabilities P(xj+i | x,) 
represent a pure diffusion process. 

P(x i+ 1 \ xi) = Af(x i+1 ,Xi,cJi(D)), (42) 

where u>i(D) = 2 DSti and Sti = tj + \ — ti. 

In section 13, we will explicitly derive P{tji \ x t , Xj+i), but here we will state the result as 

P(Vi \ Xi , Xi+1 ) = AI (y h (l - Xi + x i+ i,2 Dt e - - ^ J , 

where t f is the exposure time of a frame; in other words, the time the last photon observed in one frame can be 
arbitrarily spaced from the first photon in the next frame. Hence, trajectory intermittencies can be accounted for by 
redefining Sti so that frame i + J is the next frame that observes a photon from the particle under consideration, 
omitting all frames that do not provide measurement information. Given that P{tji \ Xi,Xi+ 1 ) is in the form of a 
Gaussian function with y, as one of the location parameters, y; can be effectively marginalized in P(oi, Xj, Xj_|_i | D) 
so that the expression reduces to 

P(oi,Xi,x i+ 1 1 D) = P(oi | Xj,x i+ i)P(x i+ i | Xi)P(xi). 

When obtaining the global probability of all O given D, the probabilities P(xj) will be conditioned on prior observa¬ 
tions, so it becomes necessary to define the expression 

P(oi, x i+1 | Xi, D) = P( 0i | Xj, x i+ i) P(x i+ i | Xi). 
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12.2 The Probability of a Trajectory of Observations 

We now wish to get the full expression for P{0 \ D), which has no dependency on the true positions, A. To do this, 
first define the expression 


N N 

P(0,X\D)=P( Xl )l[P( 

Oil •t'i+l I ‘ 77) P( x l') t f P(®i | x ii x i-\- 1) P( x i -\-1 | x i ) • 

i=l i=1 

Since A' are nuisance parameters, integrate the X variables over all configuration space and set P(x i) = 1 because it 
is assumed that x \ must already be known, since the inference of a diffusion probability must have an origin to relate 
all subsequent coordinates. 


P(0\D) 


i N 

/ TT P (°i I x i+l) P ( x i+1 | x i) • 

J *»+! ZX 


(43) 


We will see in section 13 that the following expression, which is the focus of the main manuscript, is equivalent and 
easier to evaluate 


N N-l N N-l 

P(0 | D) = / dA TT U{ 0 i, Xi ,ei{D)) TT A f(x l+1 ,x z ,iOi(D)) = / dA T\M t FT 7". (44) 

J ^ N r=i j=i i=i 

where 

A4 ?; = =Af( 0 i, Xi ,£i(D)), for 1 < i < N, and (45) 

Ti = Ti(xi+i,Xi) =Af(x i+ i,Xi,Wi(D)), for 1 < i < AT — 1. (46) 


13 Explicit Derivations on the Problem Formulation 

There were a few results presented in section 12 that are clarified in this section. The probability distribution of 
an averaged position given the start and end points of a frame are discussed and analytically solved in the continuous 
limit. Then the relation between representation of P(0 \ D) with the probability distribution and reduced measurement 
functions is investigated. 

13.1 The Probability Density of a Time Averaged Position 

Given D, the probability density of a transition from point a to to point b separated by a time T is 

P(b | a) = Af(b, a, 2DT). 

If an intermediate point, y(t) sampled at a time t < T is considered, the joint probability density of a transition from 
a to y(t) and then from y(t) to b is 

P(y(t),a , b) = P(a) Af(y(t),a, 2Dt)AT(b, y(t),2D(T - t)) 

= P(a) Af(b, a , 2 DT) Af (y(t), a (l — |) + 2D^ (T - t)) . 

The probability density of the variable y(t), preconditioned on the end points, a and b, is then 

P(y(t) | b,a) = AT (y(t),a ^ ~ f'j + b ^ 2D ^( T - *)) • (47) 
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Eq. 47 is the probability density for what is known as a Brownian Bridge (29), or Brownian motion with preconditioned 
end points. It has a mean and covariance defined as 


(y(t)) = a 1 - - + 6 - 
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st 


cov[t/(f), y(s)\ = 2D \^s - — J for s<t<T 

It is now of interest to find the probability density for a quantity that describes an integrated average of y(t) such that 

1 f te 

y= - d ty(t). 


t 


s Jt =0 


Since each y(t') is a normally distributed random variable, y(t) is a gaussian process, then the time averaged integral 
of a Gaussian process, y , is also a normally distributed random variable. Therefore, from Isserlis theorem (25) only 
the first two moments of y are needed to determine its probability distribution. The first moment is 
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where for notational convenience, we define a = t e /2T. It follows that the second moment is 
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Given that y must be a normally distributed and its first two moments are known, then 

p {y I a,b)= Af(y, (y ), (y 2 ) - {y) 2 ) = Af (y, (1 - a)a + ab , 2 Dt, 


1 a 
3 _ 2 

Furthermore, it was established in Eq. 41 that a time averaged position was related to a localized observation by a 
normal Gaussian function so that 


P(o | a,b)= J d y P{o \ y) P{y \a,b)= Af (o, (1 - a)a + ab,v + 2 Dt e ^ ^ ^ 


(48) 
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13.2 Functional Form: From Products of Probability Components to Simpler Expressions 


In the limit where the camera exposure time goes to 0, the probability of o,; is dependent on one coordinate, a;,. 
However, as the camera exposure time becomes non-negligible with respect to the time spacing between frames, the 
probability of o,-, becomes increasingly dependent on the subsequent coordinate, ,x' ?+1 . The o, dependence on both Xi 
and make a direct approach to solving the integral computationally difficult. In the main manuscript, Markov’s 
method approach showed the following relationship for a multivariate gaussian 

= ( s i s j ) > 

where E is the covariance matrix of a multivariate Gaussian function describing the vector of random variables S = 
{.s, = o l+ i — Oi}^ 1 . Given that our probability distribution is obtained by integrating several Gaussian functions, 
the result of the distribution is a Gaussian function. Therefore, if the moments of S are known, the parameter e, for 
the simpler expression can be derived given that 

(siS,:+i) = — Cj+i 


was previously shown to be true for the simpler expression in the main manuscript. Starting from our derived proba¬ 
bility expression in Eq. 48 for an arbitrary o, 


P{oi | Xi,Xi+i) = Af ^o.j, (1 - aii)xi + ctiXi+i,Vi + 2Dt e 

where v, is defined as the localization variance and 
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From the properties of a Gaussian function with 0 mean 

(Oi - (1 - Cti)Xi - OtiXi+i) = 0, 

(1 (Xi)Xi ) Vj 4" 2Dt e 


1 _ a, 
3 2 


(49) 


Which implies 


{Oi -x^ = 0 

{Oi - Xi\X) = Oli(Xi + i - Xi) 
((Oi - Xi) 2 ) =Vi+ 2-Dfe^- 


Also 


(&i) — Oi) 

Given the relations in Eq. 50 and Eq. 51 


(pi-\-1 2 Ci- tl) {oi Xi) + ( Xi-\-\ 


Xi) = 0 . 


(50) 

(51) 


(SiSi+i) — {(Oi+ 2 — Oi+l)(Oi+l — Oi)) — — e,:+i 

2 Oi + i -f Xi- j_2 ^z+2 4“ 1 Oi - (- %i -\-1 T Xi Xi)) 

= ((o i+ 1 - Xi)(x i+ 1 - Xi)) - ((o »+1 - x i+1 ) 2 ) H- 

“ Oi+i()2D5ti-\-i) 2 Dt e — Vi-\.\ = 2 Dt e — Vi+ 1. 

3 o 

Where all the other terms in the ellipsis (• • •) go to 0. Additionally, it follows that 

(s 2 ) = w.i + e,. + 1 = 2D8ti — 4Dt e - + Vi + 

Therefore 

£j(D) = Uj - 2Dt e ~. 

o 

Which is the variance correction discovered in earlier diffusion estimation papers (14, 17, 18). 
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14 Method Component Derivations 


The following sub-sections explain some of the relations that were explicitly stated to complete the derivations in the 
main text. 


14.1 Laplace Method: Maximum Likelihood of True Positions 

Recalling the objective function in the Laplace method 
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where i £ 2 : N — 1. The Hessian —lnVV/(X) = M, of Eq. 52 has the non-zero elements 
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Setting the gradient in Eq. 53 equal to 0 and moving the constants to the left hand side of the equation gives 
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With additional factoring, the expression looks like 
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The factored expression on the right can be expressed in terms of a vector product of the Hessian matrix and the 
maximum likelihood of the true positions, M ■ X. We invert the Hessian matrix to bring it to the other side of the 
equation so that the resulting expression for the maximum likelihood looks like 

X = M _1 0, 



14.2 Laplace Method: Direct Integration of the Probability Distribution Components 

Starting from the probability component formalism 

N N 

f(X) = P[-P(o j; I Xi,x i+ 1 ) P(Xi I x i+1 ) = P[ Af(0i, (1 - a t )x-i + a.iX i+ll q t ) Af (xi,x i+1 ,u)i), 

i=1 i=1 

where qi is the variance due to the observation and a, = A- • We solve for the Hessian of our objective function 

M = — Win f(X) 
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The form is similar to our functions, so we can solve for the maximum likelihood in the same fashion 


X = M -1 0 


Where the components of 0 are modified as 


(1 — Oti)Oi CXi-iOi-1 


Qi 


qi -1 


and for completeness we set ao = 0 and cin+i = 0. 

14.3 Markov Method: Marginalizing the Likelihood Function 

To understand how the integration of S' on P(0 \ D ) behaves, lets first consider the integration of O on P[0 \ D) with 
one o, held constant, which is equivalent to multiplying P(0 \ D) with a delta distribution i5(o' — Oi). The integral of 
P(0 | D) and the delta distribution with respect to O is of the form 


N N -1 


f d O 5(4 - Oi)P(0\D) = fdOdX 5(4 - o t )f[M t [] % 

J J »=1 .7=1 

r N N -1 
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Shuffling the order of integration, so that the O basis is integrated first allows us to marginalize all O except for n ,. 
There are then N terms of X which can be effectively marginalized 


JV-l 


/ dX dO (5(o- - Oj) JJ N(oi,Xi,£i) Af(x j+1 ,Xj,Uj) 
»=1 j —1 

r N-l 

= / dX AT(xj + i,Xj,Uj) = 1. 


Now we wish to perform a similar integration, but this time on the basis of S'. The integration of P(() \ D) with 
respect to O is a completely different function than an integration with respect to S', but we wish to show that the 
integration on S' yields analogous to results to integration on O with one o, held constant. If o, = o' is held constant, 
than we can directly express s' = Oj+i — o' in terms of one variable, Oj+i, if i < N. We can also express s' i _ 1 in 
terms of o*_ i if i > 1. Analogously, every oi + k can be expressed as 
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We perform this substitution with an arbitrary o, held fixed to express the integral over ,S" as 
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We can then shuffle the order of integration so that we can iteratively integrate the components of the S basis, essen¬ 
tially the components furthest from o', that are expressed in only one of the univariate gaussian functions that comprise 
P(0 | D), effectively performing a sequential marginalization 
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(54) 


From this result, we see that holding a single o, fixed is quite arbitrary, as the term is eventually marginalized by its 
associated X{. It is also apparent that fixing a particular o, allows complete isolation of a particular s, basis if all 
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other Sk bases are marginalized. However, if we wish to evaluate s, and Sj_i components with o, fixed, we see in the 
integral expression that there will be some correlation between adjacent displacements. Most importantly, we see that 
P(0 | D) is a normalized probability density under S'. 

14.4 Markov Method: Expectation Calculations 

We shall solve for the expectation values on S' over P(0 \ D). In order to do so, we will use the trick in Eq.54 where 
we hold one observation constant until the appropriate substitutions are taken. Solving for (s,) in this spirit yields 
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where the terms 3 t and 7 j are intermediate variables that are generated from rearranging Gaussian functions. These 
variables are immediately marginalized by integration on X, so we shall omit their explicit representation. Aside from 
the fact that the expectation value on the displacement means are all 0 , we also discover another striking property from 
Eq. 55, that is, the marginalization of all other s' and all X isolates the expectation calculation to components of s' 
that are independent of all other marginalized variables. The same results from Eq. 55 are used so that the expectation 
of the variance is 
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Now on to the covariance terms; starting with adjacent displacements (sjSi+i). However, if a given o, is fixed, 
there is a dependence between adjacent displacements. With the techniques implemented in Eq. 55 a bivariate gaussian 
function is isolated from the rest of the integral expression. 
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Where the substitution of variables required for integrating the expectation of (s's' +1 ) induces a bivariate Gaussian 
function with location parameters Sb = [s», Si+i] T and covariance matrix 




w* + G + G+l — G+1 


Conversely, when evaluating (s.iSi +k ) where fe > 1, it is necessary to temporarily fix o, as well as o, + fc, this results in 
isolating two univariate Gaussian functions which results in 
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15 Implementation 

15.1 Log-product Computation 

A common algorithmic problem shared by each of the three methods is the need to compute logarithms of products. 
A straight forward method is to utilize the identity 


( jv \ N 

n* = Y. 111 rij , (56) 

i—1 / i=l 

the left side of which requires one logarithm and N — 1 multiplications, while the right side requires N logarithms 
and N — 1 additions. Directly using the left side of Eq. 56 in computations can lead to numerical over- or underflow, 
while the N logarithms required for the right side can dominate the computational costs, taking up the majority of the 
computational cycles for each of the three algorithms. Thus, to minimize the number of logarithm computations, yet 
still maintain numerical accuracy, we utilize a hybrid log-product implementation to evaluate forms like Eq. 56. The 
log-product method builds up a product of a r values, only taking a logarithm when multiplying by the next a, would 
lead to loss of precision, overflow, or underflow. 

15.2 Computation of Variance terms 

Given the inputs, each of the method firsts compute the N variance terms due to measurement, £i(D) - v t — Dt e / 3, 
and the N — 1 variance terms due to diffusion, uJi(D) = 2DSti. Any of these variance terms can be arbitrarily close 
to zero, so to prevent numerical instabilities we bound these terms away from zero by at least machine epsilon, while 
preserving the sign which can be negative for Si(D). 

15.3 The Recursive Method Algorithm 


Listing 1: The Recursive Algorithm for the log-likelihood calculation in C++. This fundemental algorithm can be 
impemented with just the log function from the C++ standard math library. We rely on the variances to be computed 
as described in Sec. 15.2, and the log-product computation as described in Sec. 15.1 


FloatT recusiveLLH (i nt N, 



const 

FloatT 

Obs [] 

const 

FloatT 

dT[] , 

const 

FloatT 

vD[] , 

const 

FloatT 

vM [ ]) 


{ 
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FloatT alpha [N— 1]; 

FloatT eta = vD[0]+vM[0]; 

FloatT mu = Obs[0]; 

FloatT LLH = 0; 
for(int n=l ;n<N— l;n++){ 

FloatT temp_alpha = vM[n]+eta; 
alpha[n —1] = temp_alpha; 

FloatT temp_diff = (Obs[n]—mu); 

LLH += temp_diff * temp_diff / temp_alpha ; 
mu = (mu*vM[ n] + Obs [ n] * eta )/temp .alpha ; 
eta = vM[n] * eta / temp_alpha+vD [ n ]; 

} 

alpha [N-2] = vM[N-1]+ eta ; 

LL H += (N—l)*log2pi ; 

LLH += logprod(alpha); 

FloatT temp_diff = (Obs [N—1]—mu); 

LLH += temp _d iff * temp _d iff / alpha [N—2]; 

LL H *= -0.5; 
return LLH; 

} 

15.4 Tri-Diagonal matrix algorithms 

The Laplace and Markov method both require solving linear systems of the form Ax = b, where A is symmetric 
tri-diagonal (all non-zero terms are on the main diagonal or those diagonals immediately above and below the main 
diagonal). A naive solution based on inverting matrix A has computational complexity O (A ?3 j, where N is the 
length of the trajectory. However because A is tri-diagonal there are established algorithms (30) that use the Gaussian 
elimination strategy to solve the system in time O (TV). Furthermore, the determinant of the matrix can be shown to 
follow a recurrence relation (31) that leads to a linear time computation. Combined with the log-product computation 
described in Sec. 15.1, this leads to a fast algorithm for computing log(det(A)) with a minimum number of calls to 
the logarithm function, and time complexity O (N). We make use of these algorithms for the Laplace and Markov 
method implementations. 

15.5 DST Algorithm 

The DST algorithm code is based on the Matlab code provided by the authors (16). Our implementation is as faithful as 
possible to the original implementation but there are a few caveats that should be mentioned. First because we assume 
that the localization variances are known, we use the form of the estimator that takes in the mean localization variance 
as an input parameter. Because the DST can only incorporate a single localization variance we use the mean of the 
given localization variances. For the R parameter we use t e /6 as was derived for uniform exposure intervals (16). 
The original implementation also leaves off the constant term —(1/2) (A — 1) log(27r) from the likelihood calculation. 
While for MLE estimation this is not important, it does become important for other downstream analysis using the 
likelihood, and our algorithms do include this term, so we have added that in to the DST to make the plotted com¬ 
parisons more fair. Also, we ensure that the resulting values are always real as the original code can return complex 
floating point numbers, but only the real parts have meaning. Finally, because of the reliance on the discrete sine 
transform function which is provided by Matlab, but not available natively in the C++ standard math library, the speed 
results from the main paper only test the provided Matlab implementation. 
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